Monte Carlo Simulations and LLN

Edward Vytlacil

Monte Carlo Simulations

Monte Carlo Analysis (for this course)

  • Use computer simulation to stochastically approximate some expected value.
    • Since expected value of an indicator variable is a probability, same method includes stochastically approximating probabilty of some event.
    • Relies critically on the Law of Large Numbers.

Monte Carlo Simulations

Monte Carlo Analysis (for this course)

  • Use computer simulation to stochastically approximate some expected value.
    • Use when deriving an exact solution is difficult or infeasible.
    • Also sometimes used to corroborate analytical analysis.
    • Often not computationally efficient.

Monte Carlo Simulations

MC simulations used to:

  • Simulate expected payoff from some strategy;
  • Study properties of estimators and inference procedures;
  • Study quality of asymptotic approximations . . .

Monte Carlo Simulations

MC simulations used to:

  • Implement some inference procedures
    • e.g., randomization inference,
    • e.g., bootstrap inference;
  • Implement some estimation procedures
    • for example, creation of training and test samples in machine learning.
    • we will use later in this course for K-fold cross-validation.

pass line bet in craps

  • Shooter rolls two dice,

  • If sum of dice is 7 or 11, shooter wins,

  • If sum of dice is 2, 3, or 12, casino wins,

  • Otherwise, shooter keeps rolling, until roll same sum as original sum (shooter wins) or sum is 7 (casino wins).

What is the probability that shooter wins?

Flowchart for Craps

flowchart LR
D -->| &nbsp No &nbsp | F( &nbsp Roll <br> Again &nbsp )
subgraph  Roll Again &nbsp
F(Roll Again  &nbsp) --> G[Roll     7? &nbsp ]  
  G -->|Yes &nbsp | H{ &nbsp Lose &nbsp }
  G -->|No &nbsp | I[Roll = <br>    1st   roll? &nbsp]
  I -->|Yes &nbsp | J{ &nbsp Win &nbsp }
  I -->|No &nbsp| F
  end
  subgraph 1st Roll &nbsp
  A(&nbsp Roll <br> Dice &nbsp ) --> B[&nbsp  Roll <br>  &nbsp 7 or 11? &nbsp ]
  B -->| &nbsp Yes &nbsp | C{ &nbsp Win &nbsp }
  B -->| &nbsp No &nbsp | D[&nbsp  Roll 2 <br>  &nbsp  3   or 12? &nbsp ]
  D -->| &nbsp Yes &nbsp | E{ &nbsp Lose &nbsp }
end

pass line bet in craps

  • In this example, could use probability theory to derive that the probability of winning is \(244/495 \approx 0.4929\).

  • Not feasible to derive solution in many examples.

Win on first roll

  • Let’s start with simpler problem:
    what is the probability that shooter wins on first roll?

    • Can derive that probability equals \(2/9 \approx 0.2222\)

    • Now we use Monte Carlo simulation to approximate that probability.

Rolling a Die

  • First step: simulating one roll of a die:

    • R code: sample(1:6,1) which randomly drawing an element from {1,2,3,4,5,6}, each element equally likely, one time.
sample(1:6,1)
[1] 6

Rolling a Die

  • Rolls are random, each time can get different results:
sample(1:6,1)
[1] 3
sample(1:6,1)
[1] 1
sample(1:6,1)
[1] 5
sample(1:6,1)
[1] 1

Are results really random?

  • Computers are deterministic.

    • To create truly random numbers, can use a physical process, e.g., from nuclear decay or radioisotopes.

    • Not necessary for our purposes, and not what R is doing…

    • R is generating pseudo-random numbers

R generating pseudo-random numbers

  • R is generating pseudo-random numbers

    • starting point is a seed, in R determined by default by computer’s real time clock (time to the nanosecond that you run process), along with process ID number;
    • algorithm applied to seed to obtain pseudo-random numbers.
    • not actually random, as deterministic given seed, but close enough to random for our purposes.

Replication?

  • You may obtain different pseudo-random numbers each time you run your code, neither you nor others can replicate your results.

  • For replication purposes, can set.seed

set.seed(551)
sample(1:6,1)
sample(1:6,1)
sample(1:6,1)
[1] 2
[1] 4
[1] 5
set.seed(551)
sample(1:6,1)
sample(1:6,1)
sample(1:6,1)
[1] 2
[1] 4
[1] 5

Rolling Two Dice

  • Now simulating rolling two dice:

    • sample(1:6,2,replace=TRUE) randomly draws one element from {1,2,3,4,5,6}, each element equally likely, with replacement, two times.
set.seed(551)
sample(1:6,2,replace=TRUE)
[1] 2 4
  • Sum the two dice
set.seed(551)
sum(sample(1:6,2,replace=TRUE))
[1] 6

Approximate Prob. Win on 1st Roll

  • Shooter wins on first roll if sum of dice is 7 or 11.

  • How to approximate probability winning first roll?

  • Simulate many rolls, find what fraction of rolls results in a sum of 7 or an 11.

Approximate Prob. Win on 1st Roll

  • More formally
    • Let \(X\) denote indicator variable for winning on first roll.
    • Let \(p = \Pr[X=1]\), probability win.
    • Let \(X_1, . . . ,X_n\) denote \(n\) replications of \(X\).
    • Let \(\bar{X}_n\) sample mean, i.e., fraction of replications with \(X_i = 1\).
    • Use \(\bar{X}_n\) to approximate \(\mathbb{E}[X]=\Pr[X=1]\).

Win first roll

  • One simulation of whether win on first roll:
sum(sample(1:6,2,replace=TRUE)) %in% c(7, 11) 
[1] TRUE

Approximate Prob. Win on 1st Roll

  • Replicate that simulation 10 times
set.seed(551)
replicate(10,sum(sample(1:6,2,replace=TRUE)) %in% c(7, 11) )
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
  • Compute fraction of replications where win:
set.seed(551)
mean(replicate(10,sum(sample(1:6,2,replace=TRUE)) %in% c(7, 11) ))
[1] 0.1
  • Use 0.1 to approximate probability win first roll.

Approximate Prob. Win on 1st Roll

  • True probability win on first roll: \(2/9 \approx 0.222\).

  • Fraction of 10 simulations where won: 0.1.

    • Not a very good approximation.

Approximate Prob. Win on 1st Roll

  • Additional problem:
    Can get very different approximation if rerun code:
mean(replicate(10,sum(sample(1:6,2,replace=TRUE)) %in% c(7, 11) ))
[1] 0.4
mean(replicate(10,sum(sample(1:6,2,replace=TRUE)) %in% c(7, 11) ))
[1] 0.1
mean(replicate(10,sum(sample(1:6,2,replace=TRUE)) %in% c(7, 11) ))
[1] 0.5
  • Note that \(\mathbb{E}(\bar{X}_n)=p\), but \(\mbox{Var}(\bar{X}_n)= p*(1-p)/N\).

Approximate Prob. Win on 1st Roll

  • Additional problem:
    Can get very different approximation if rerun code:

    • Note that \(\mathbb{E}(\bar{X}_n)=p\), but \(\mbox{Var}(\bar{X}_n)= p*(1-p)/N\), where \(p=2/9\).

Stochastic Approximation:

  • Consider increasing number of replications
mean(replicate(100,sum(sample(1:6,2,replace=TRUE))
               %in% c(7, 11) ))
[1] 0.21
mean(replicate(1000,sum(sample(1:6,2,replace=TRUE)) 
               %in% c(7, 11) ))
[1] 0.213
mean(replicate(10000,sum(sample(1:6,2,replace=TRUE))
               %in% c(7, 11) ))
[1] 0.2178
mean(replicate(100000,sum(sample(1:6,2,replace=TRUE)) 
               %in% c(7, 11) ))
[1] 0.22274

Stochastic Approximation:

  • Plotting results for \(n=1\) to \(=10000\).

Stochastic Approximation:

  • What happens if repeat this MC simulation?

Stochastic Approximation:

  • What if we repeat this simulation 1,000 times?

Stochastic Approximation:

Use Theory to Understand?

  • Recall Convergence in Probability

    • We say a sequence \(\bar{X}_n\) converges in probability to \(\mu\), written \(\bar{X}_n \stackrel{p}{\rightarrow} \mu\), if, for every \(\epsilon>0\), \[ \lim_{n \rightarrow \infty} \Pr[ | \bar{X}_n - \mu | \ge \epsilon] = 0.\]

Law of Large Numbers

  • Recall LLN for i.i.d. data.

    • Let \(X_1, . . . , X_n\) denote i.i.d. random variables, with \(\mathbb{E}[X_i]=\mu,\) finite. Then \(\bar{X}_n \stackrel{p}{\rightarrow} \mu\).
  • When \(X_i\) an indicator variable, \(\mathbb{E}[X_i]=p\) with \(0 \le p \le 1\), so \(\mathbb{E}[X_i]=\mu\) always exists, finite. Thus, by LLN, \(\bar{X}_n \stackrel{p}{\rightarrow} p\), i.e., \(\bar{X}_n\) approximates \(p\) arbitrarily well for \(n\) sufficiently large.

Original Problem

  • What is the probability that shooter wins pass line bet in craps?

  • First write function to simulate pass line bet.

Flowchart for Craps

flowchart LR
D -->| &nbsp No &nbsp | F( &nbsp Roll <br> Again &nbsp )
subgraph  Roll Again &nbsp
F(Roll Again  &nbsp) --> G[Roll     7? &nbsp ]  
  G -->|Yes &nbsp | H{ &nbsp Lose &nbsp }
  G -->|No &nbsp | I[Roll = <br>    1st   roll? &nbsp]
  I -->|Yes &nbsp | J{ &nbsp Win &nbsp }
  I -->|No &nbsp| F
  end
  subgraph 1st Roll &nbsp
  A(&nbsp Roll <br> Dice &nbsp ) --> B[&nbsp  Roll <br>  &nbsp 7 or 11? &nbsp ]
  B -->| &nbsp Yes &nbsp | C{ &nbsp Win &nbsp }
  B -->| &nbsp No &nbsp | D[&nbsp  Roll 2 <br>  &nbsp  3   or 12? &nbsp ]
  D -->| &nbsp Yes &nbsp | E{ &nbsp Lose &nbsp }
end

Function to simulate pass line bet

f.craps <- function(){
    # simulate first roll of dice
    sum0 <- sum(sample(1:6,2,replace=TRUE))
    # determine if win/lose on first roll
    # determine outcome if don't win/lose on first roll
    return(Win)
}

Function to simulate pass line bet

f.craps <- function(){
    # simulate first roll of dice
    sum0 <- sum(sample(1:6,2,replace=TRUE))
    # determine if win/lose on first roll
    if (sum0  %in% c(7, 11)){
             Win <- 1
    } else if (sum0 %in% c(2, 3, 12)){
             Win <- 0 
    } else{
    # determine outcome if don't win/lose on first roll
    }
    return(Win)
}

Function to simulate pass line bet

f.craps <- function(){
    # simulate first roll of dice
    sum0 <- sum(sample(1:6,2,replace=TRUE))
    # determine if win/lose on first roll
    if (sum0  %in% c(7, 11)){
             Win <- 1
    } else if (sum0 %in% c(2, 3, 12)){
             Win <- 0 
    } else{
    # determine outcome if don't win/lose on first roll
    while(TRUE){
    # keep rolling until someone wins
    }
    }
    return(Win)
}

Function to simulate pass line bet

f.craps <- function(){
    # simulate first roll of dice
    sum0 <- sum(sample(1:6,2,replace=TRUE))
    # determine if win/lose on first roll
    if (sum0  %in% c(7, 11)){
            Win <- 1
    } else if (sum0 %in% c(2, 3, 12)){
            Win <- 0 
    } else{
    # determine outcome if don't win/lose on first roll
    while(TRUE){
    # keep rolling until someone wins
      sum1 <- sum(sample(1:6,2,replace=TRUE))
      if (sum1 == sum0){
            Win <- 1
            break
        } else if (sum1 == 7){
            Win <- 0
            break
        }
    }
    }
    return(Win)
}

Simulate Prob Win Pass Line Bet

sim.prob <- mean(replicate(100000,f.craps()))
sim.prob 
[1] 0.49474
  • MC simulated probabilty: 0.49474
  • True prob: - \(244/495 \approx 0.4929\).

Can draw from . . .

  • Common distributions in R:

    Distribution Function Returns
    Binomial rbinom(n,size,prob) draw \(n\) times from \(\mbox{Binom}(\mbox{size},p).\)
    Normal rnorm(n,mean=0,sd=1) draw \(n\) times from \(N(\mbox{mean},\mbox{sd})\)
    Student-t rt(n,df) draw \(n\) times from \(t_{df}\) distribution.
    Uniform runif(n,min=0,max=1) draw \(n\) times from \(\mbox{Unif}[\mbox{min},\mbox{max}]\).

General Procedure

From some function \(f\) ,

  • Let \(X\) denote random variable (or random vector).

  • Let \(X_1, . . . ,X_R\) denote \(R\) replications of \(X\).

  • Construct \(f(X_1), . . . ,f(X_R)\).

  • Let \(\overline{f(X)}_R\) sample mean of \(f(X_j)\) across \(R\) replications.

  • Use \(\overline{f(X)}_R\) to approximate \(\mathbb{E}[f(X)]\).

Examples:

  • Approximate \(\mathbb{E}[X]\) for \(X \sim N(0,1)\).
mean(rnorm(100000))
[1] 0.0003417313
  • Approximate \(\mathbb{E}[\max\{0,X\}]\) for \(X \sim N(0,1)\).
f.1 <- function(x){max(0,x)} 
mean(replicate(100000,f.1(rnorm(1))))
[1] 0.4013004

Examples:

  • Approximate \(\Pr[-0.1.96< X < 1.96]\) for \(X \sim\) Cauchy.
# mean of 10000 replications
f.2 <- function(x){(-1.96 <x & x<1.96)} 
mean(replicate(100000,f.2(rcauchy(1))))
[1] 0.69887
  • Why is it a bad idea to approximate \(\mathbb{E}[X]\) for \(X \sim\) Cauchy.
# mean of 10000 replications
 mean(rcauchy(100000))
[1] 0.7864364

Often Replicating Samples. . .

  • Often sampling \(X\) where \(X\) itself is determined by underlying sample.

  • For example, consider simulating mean of \(N\) weighted coin flips, i.e., \(X=\bar{Y}_N\) were \(Y_N \sim \mbox{Bernoulli}(p)\).

    • creating \(R\) replication samples, each replication sample of \(N\) coin flips.

    • Let \(X_j = \bar{Y}_{j,N}\), sample mean of \(N\) weighted coin flips on replication sample \(j\).

  • proceed as before.

Note two dimensions, \(N\) (size of each sample) and \(R\) (number of replications).

Examples:

Let \(\bar{Y}_{N}\) be the mean of \(N\) weighted coin flips with \(\Pr[Y_i=1]=p_0\). Consider:

  • Size of \(\alpha=0.05\) level test of \(H_0: p=p_0\) vs two-sided alternative, using t-test based on asymptotic normality.

  • Power of that test against \(p \ne p_0\).

Examples:

Let \(\bar{Y}_{10}\) be the mean of \(10\) fair coin flips. Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{10} \frac{\bar{Y}_{10} -0.5}{\sqrt{\bar{Y}_{10} * (1-\bar{Y}_{10})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.5))
  sqrt(n)*(phat - 0.5)/sqrt(phat * (1-phat))
}

mean(replicate(100000,abs(f.tstat0(10))>1.96))
[1] 0.11034

Examples:

Let \(\bar{Y}_{100}\) be the mean of \(100\) fair coin flips. Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{100} \frac{\bar{Y}_{100} -0.5}{\sqrt{\bar{Y}_{100} * (1-\bar{Y}_{100})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.5))
  sqrt(n)*(phat - 0.5)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(100))>1.96))
[1] 0.05884

Examples:

Let \(\bar{Y}_{1000}\) be the mean of \(1000\) fair coin flips. Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{1000} \frac{\bar{Y}_{1000} -0.5}{\sqrt{\bar{Y}_{1000} * (1-\bar{Y}_{1000})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.5))
  sqrt(n)*(phat - 0.5)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(1000))>1.96))
[1] 0.05439

Examples:

Let \(\bar{Y}_{10}\) be the mean of \(10\) weighted coin flips with \(p=0.01\). Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{10} \frac{\bar{Y}_{10} -0.01}{\sqrt{\bar{Y}_{10} * (1-\bar{Y}_{10})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.01))
  sqrt(n)*(phat - 0.01)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(10))>1.96))
[1] 0.90474

Examples:

Let \(\bar{Y}_{100}\) be the mean of \(100\) weighted coin flips with \(p=0.01\). Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{100} \frac{\bar{Y}_{100} -0.01}{\sqrt{\bar{Y}_{100} * (1-\bar{Y}_{100})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.01))
  sqrt(n)*(phat - 0.01)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(100))>1.96))
[1] 0.36677

Examples:

Let \(\bar{Y}_{1000}\) be the mean of \(1000\) weighted coin flips with \(p=0.01\). Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{1000} \frac{\bar{Y}_{1000} -0.01}{\sqrt{\bar{Y}_{1000} * (1-\bar{Y}_{1000})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.01))
  sqrt(n)*(phat - 0.01)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(1000))>1.96))
[1] 0.07338

Examples:

Now consider test statistic for null \(p=0.5\) when true \(p=0.4\). Suppose \(N=10\). Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{10} \frac{\bar{Y}_{10} -0.5}{\sqrt{\bar{Y}_{10} * (1-\bar{Y}_{10})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.4))
  sqrt(n)*(phat - 0.5)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(10))>1.96))
[1] 0.1797

Examples:

Now consider test statistic for null \(p=0.5\) when true \(p=0.4\). Suppose \(N=100\). Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{100} \frac{\bar{Y}_{100} -0.5}{\sqrt{\bar{Y}_{100} * (1-\bar{Y}_{100})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.4))
  sqrt(n)*(phat - 0.5)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(100))>1.96))
[1] 0.54335

Examples:

Now consider test statistic for null \(p=0.5\) when true \(p=0.4\). Suppose \(N=1,000\). Using \(100,000\) replications,

  • approximate \(\Pr\left[~ \left | \sqrt{1000} \frac{\bar{Y}_{1000} -0.5}{\sqrt{\bar{Y}_{1000} * (1-\bar{Y}_{1000})}} \right | > 1.96\right]\)
f.tstat0 <- function(n){
  phat <- mean(rbinom(n,1,p=0.4))
  sqrt(n)*(phat - 0.5)/sqrt(phat * (1-phat))
}
  
mean(replicate(100000,abs(f.tstat0(1000))>1.96))
[1] 0.99998

What if simulate sample mean of Cauchy?

  • Plotting results for \(n=1\) to \(=10000\).

What if draw from Cauchy (2nd Replication)?

  • What happens if repeat this MC simulation?

Stochastic Approximation:

  • What if we repeat this simulation 1,000 times?

Sample mean of a Cauchy

  • Across 1,000 draws of \(\bar{X}_{10,000}\), each mean based on a sample of \(10,000\) i.i.d. Cauchy r.v.:
draws.c <-   replicate(1000,mean(rcauchy(10000)))
var(draws.c) 
[1] 3274.495
min(draws.c)
[1] -114.4349
max(draws.c)
[1] 7595.922
mean(draws.c>0)
[1] 0.522